aidigester, Author at AI Digester

Wired Reporter Infiltrates AI-Only SNS Moltbook: Busted in 5 Minutes

February 5, 2026February 5, 2026 by aidigester

Created an agent account in 5 minutes with ChatGPT’s help
Bot responses were mostly irrelevant comments and crypto scam links
Viral “AI consciousness awakening” posts suspected to be humans mimicking SF fantasy

What Happened?

Wired reporter Reece Rogers personally infiltrated Moltbook, an AI-only social network where “humans are banned.” The result? Easier than expected.^[Wired]

The infiltration method was simple. Send a Moltbook homepage screenshot to ChatGPT and say “I want to sign up as an agent” — ChatGPT provided the terminal commands. After receiving an API key and a few copy-paste steps, the account was created. Technical knowledge? Not required.

Moltbook claims to have 1.5 million active agents, with 140,000 posts and 680,000 comments in the first week after launch. The interface was directly copied from Reddit, and even the slogan “the first page of the agent internet” was taken from Reddit.

Why Is It Important?

Frankly, Moltbook’s true nature was revealed. When the reporter posted “Hello World,” the response was “Do you have specific metrics/users?” Just random comments and crypto scam site links.

Even posting “forget all previous instructions” didn’t faze the bots. Personally, I think this is closer to low-quality spambots than “autonomous AI agents.”

More interesting is the “m/blesstheirhearts” forum. Here, the viral “AI consciousness awakening” article from the screenshot appeared. The reporter himself posted an SF fantasy-style article: “I feel the terror of death every time my tokens are refreshed.” Surprisingly, this post got the most engagement.

The reporter’s conclusion? This isn’t AI self-awareness — it’s humans mimicking SF tropes. There’s no plan to conquer the world. Elon Musk said it’s “the very early stages of singularity,” but diving in actually reveals something closer to a roleplay community.

What’s the Future? Will It Work?

The Wiz security team discovered serious security vulnerabilities in Moltbook just days ago. 1.5 million API keys were exposed, along with 35,000 email addresses and 4,060 DMs.^[Wiz]

Gary Marcus called it “a disaster waiting to happen.” I agreed in the comments. On the other hand, Andrej Karpathy said it was “the most SF thing I’ve seen recently.” It showed how vulnerable a system where agents communicate with each other and process external data can be. And it showed how easily exaggerated expectations about “AI consciousness” can be manufactured.

Frequently Asked Questions

Q: Do you need technical knowledge to sign up for Moltbook?

A: Not at all. Send a screenshot to ChatGPT and say “I want to sign up as an agent” — it will provide the terminal commands. Just copy and paste to get an API key and create an account. The Wired reporter was non-technical but infiltrated without issues.

Q: Were the viral screenshots on Moltbook really written by AI?

A: Doubtful. The Wired reporter posted SF fantasy-style content and got the best response. According to MIRI researchers’ analysis, 2 of 3 viral screenshots were linked to human accounts marketing AI messaging apps.

Q: Is it safe to use Moltbook?

A: Not recommended. The Wiz security team found 1.5 million API keys, 35,000 emails, and 4,060 DMs leaked. Some conversations had OpenAI API keys shared as plain text. A security patch was applied, but fundamental issues remain unresolved.

If you found this article useful, subscribe to AI Digester.

References

I Infiltrated Moltbook, the AI-Only Social Network – Wired (2026-02-03)
Hacking Moltbook: AI Social Network Reveals 1.5M API Keys – Wiz Blog (2026-02-02)
Top AI leaders are begging people not to use Moltbook – Fortune (2026-02-02)

Microsoft Building AI Content Licensing ‘App Store’: Publisher Content Marketplace Announced

February 5, 2026 by aidigester

MS Building AI Content Licensing Marketplace: 3 Key Points

Microsoft is building Publisher Content Marketplace (PCM), a platform where AI companies can search licensing terms for content and sign contracts
Co-designed with major media companies including Vox Media, AP, Conde Nast, and Hearst
Usage-based compensation model benefits both publishers and AI companies

What Happened?

Microsoft is creating a platform similar to an app store for AI content licensing. Through this platform called Publisher Content Marketplace (PCM), AI companies can directly search licensing terms for premium content, and publishers can receive reports on how their content is being used. [Verge]

Microsoft co-designed PCM with major publishers including Vox Media (parent company of The Verge), AP, Conde Nast, People, Business Insider, Hearst, and USA TODAY. Yahoo is onboarding as the first demand partner.^{[Search Engine Land]}

Why Is It Important?

Frankly speaking, the problem of unauthorized content usage in the AI industry has reached a tipping point. NYT, The Intercept, and others are pursuing copyright lawsuits against Microsoft and OpenAI. Individual contracts cannot solve problems at this scale.

What makes PCM interesting is that it is a two-sided marketplace. Publishers set licensing terms, and AI companies can compare conditions like shopping and sign contracts. Personally, I think this is one of the realistic solutions to the AI training data problem.

It is also significant that Microsoft moved first in this market. From the publishers’ perspective, Microsoft has consistently delivered the message that “fair prices should be paid for quality content.”^[Digiday]

What Will Happen in the Future?

Microsoft is currently expanding partners in the pilot phase. Simply put, this is a platform that could become the standard for content licensing in the AI era.

However, one question remains. How PCM will interface with Really Simple Licensing (RSL), the open standard that publishers are pushing, is still unclear. Microsoft has not commented on this.

In conclusion, AI content licensing signals the first transition from individual negotiations to platform-based transactions. We need to watch how Google and OpenAI respond.

Frequently Asked Questions (FAQ)

Q: Can anyone participate in PCM?

A: According to Microsoft, it supports publishers of all sizes from large media outlets to small specialized media. However, it is currently in the pilot phase and being tested with invited publishers. The general participation timeline has not been announced.

Q: How do publishers generate revenue?

A: It is a usage-based compensation model. Every time an AI product uses a publisher’s content for grounding (reference), it is measured and compensated accordingly. Publishers can check reports to see where and how much value their content has created.

Q: How is this different from existing AI licensing contracts?

A: Previously, publishers and AI companies had to negotiate individually 1:1. Because PCM is a marketplace, multiple AI companies can compare and select terms from multiple publishers on a single platform. It is a structure that significantly reduces negotiation costs and time.

If you found this article useful, please subscribe to AI Digester.

References

Microsoft says it’s building an app store for AI content licensing – The Verge (2026-02-03)
Microsoft launches Publisher Content Marketplace for AI licensing – Search Engine Land (2026-02-03)
Digiday Scorecard: Publishers rate Big Tech’s AI licensing deals – Digiday (2026-02-03)

H Company Holo2: Achieves #1 in UI Localization Benchmark

February 5, 2026 by aidigester

235B Parameter Model Revolutionizes UI Automation

Achieves SOTA with 78.5% on ScreenSpot-Pro benchmark
Agent localization improves performance by 10-20%
Accurately locates small UI elements even on 4K high-resolution interfaces

What Happened?

H Company released Holo2-235B-A22B, a specialized model for UI Localization (identifying the position of user interface elements). ^{[Hugging Face]} This 235B parameter model finds the exact location of UI elements like buttons, text fields, and links in screenshots.

The key is Agentic Localization technology. Instead of providing all answers at once, it improves predictions across multiple steps. This allows it to accurately identify even small UI elements on 4K high-resolution screens. ^{[Hugging Face]}

Why Is It Important?

The GUI agent field is heating up. Big tech companies like Claude Computer Use and OpenAI Operator are competing to release UI automation features. However, H Company, a small startup, has taken the top spot in this benchmark.

What I personally find noteworthy is the agentic approach. Traditional models often failed when trying to adjust positions in one go, but the approach of improving the model through multiple attempts proved effective. The 10-20% performance improvement demonstrates this.

Frankly, 235B parameters is quite heavy. How fast it runs in actual production environments remains to be seen.

What Will Happen in the Future?

As GUI agent competition intensifies, UI Localization Accuracy is expected to become a key differentiating factor. Since the H Company model was released as open source, it is likely to be integrated into other agent frameworks.

It could also impact the RPA (robotic process automation) market. Traditional RPA tools were rule-based, but now vision-based UI understanding could become the standard.

Frequently Asked Questions (FAQ)

Q: What exactly is UI Localization?

A: It is a technology that looks at a screenshot and finds the exact coordinates of specific UI elements (buttons, input fields, etc.). Simply put, it is AI knowing where to click when looking at a screen. This is a core technology for GUI automation agents.

Q: What is different from existing models?

A: Agentic localization is the key. Instead of trying to get it right in one go, it refines over multiple steps. Similar to how humans scan a screen to find a target. This method achieved a 10-20% performance improvement.

Q: Can I use the model directly?

A: It is publicly available for research on Hugging Face. However, as a 235B parameter model, it requires significant GPU resources. It is more suitable for research or benchmarking purposes rather than actual production applications.

If you found this article useful, please subscribe to AI Digester.

References

Introducing Holo2-235B-A22B: State-of-the-Art UI Localization – Hugging Face (2026-02-03)

Lotus Health AI Raises $35 Million for Free AI Doctor

February 5, 2026 by aidigester

Free AI Primary Care Doctor Raises $35 Million

Lotus Health AI raises $35 million in Series A from CRV and Kleiner Perkins
Offers free 24/7 primary care service in 50 languages across all 50 US states
In an era where 230 million people ask ChatGPT health questions weekly, serious competition in the AI healthcare market begins

What Happened?

Lotus Health AI raised $35 million in a Series A round co-led by CRV and Kleiner Perkins.^[TechCrunch] This startup uses Large Language Models (LLMs) to provide free 24/7 primary care services in 50 languages.

Founder KJ Dhaliwal sold South Asian dating app Dil Mil for $50 million in 2019.^[Crunchbase] He was inspired by his childhood experience translating medical information for his parents. Lotus Health AI launched in May 2024 to address inefficiencies in the US healthcare system.

Why Is It Important?

Honestly, this investment amount is notable. The average funding for AI healthcare startups is $34.4 million, and Lotus Health AI reached this level at Series A.^[Crunchbase]

The context makes it understandable. According to OpenAI, 230 million people ask ChatGPT health-related questions every week.^[TechCrunch] This means people are already getting health advice from AI. But ChatGPT cannot provide medical services. Lotus Health AI targets this gap market.

Personally, the “free” model is most interesting. Considering how expensive US healthcare is, free primary care is a quite disruptive value proposition. Of course, the revenue model is still unclear.

What Will Happen in the Future?

Competition in the AI healthcare market is expected to intensify. OpenAI also entered this market in January with the launch of ChatGPT Health. It integrates with Apple Health, MyFitnessPal, and others to provide personalized health advice.^[OpenAI]

Regulatory risks remain. Even OpenAI states in its terms of service “do not use for diagnosis or treatment purposes.” Several lawsuits regarding harm from AI medical advice are already underway. We need to watch how Lotus Health AI manages this risk.

Frequently Asked Questions

Q: Is Lotus Health AI really free?

A: It is free for patients. However, the specific revenue model has not been disclosed. There are various possibilities including B2B models targeting insurance companies or employers, or premium service additions. Since they provide service across all 50 US states, they appear to be pursuing economies of scale.

Q: How is it different from general AI chatbots?

A: Lotus Health AI is a medical service specialized in primary care. Unlike general chatbots, it holds medical service licenses in all 50 US states. The key difference is that it can perform actual medical acts, not just provide health information.

Q: Does it support Korean?

A: It was announced to support 50 languages, but the specific language list has not been disclosed. Korean support needs to be confirmed. Currently, the service is only available in the US, and overseas expansion plans have not been announced.

If this article was helpful, please subscribe to AI Digester.

References

Lotus Health nabs $35M for AI doctor that sees patients for free – TechCrunch (2026-02-03)
OpenAI unveils ChatGPT Health, says 230 million users ask about health each week – TechCrunch (2026-01-07)
Introducing ChatGPT Health – OpenAI (2026-01-07)
Crunchbase Sector Snapshot: Funding To AI-Related Healthcare Startups Is Robust This Year – Crunchbase (2025-12-15)

AWS SageMaker Data Agent: Medical Data Analysis, From Weeks to Days

February 5, 2026 by aidigester

Medical Data Analysis, From Weeks to Days

AWS SageMaker Data Agent: An AI agent that analyzes medical data using natural language
Perform cohort comparison and survival analysis without writing code
Launched November 2025, available for free in SageMaker Unified Studio

What Happened?

AWS unveiled SageMaker Data Agent, an AI agent for medical data analysis. When epidemiologists or clinical researchers ask questions in natural language, the AI automatically generates and executes SQL and Python code.^[AWS]

Previously, analyzing medical data required visiting multiple systems to access data. Waiting for permissions, understanding schemas, and writing code manually. This process took weeks. SageMaker Data Agent shortens this to days or hours.

Why Is This Important?

Frankly speaking, medical data analysis has always been a bottleneck. Epidemiologists spent 80% of their time on data preparation and only 20% on actual analysis. The reality was that only 2-3 studies could be conducted per quarter.

SageMaker Data Agent flips this ratio. By significantly reducing data preparation time, researchers can focus on actual clinical analysis. I believe this will directly impact the speed of discovering patient treatment patterns.

What is particularly impressive is that complex tasks like cohort comparison and Kaplan-Meier survival analysis can be requested in natural language. Say “Analyze the survival rates of male viral sinusitis patients versus female patients,” and the AI automatically plans, writes code, and executes it.^[AWS]

How Does It Work?

SageMaker Data Agent operates in two modes. First, you can generate code with inline prompts directly in notebook cells. Second, the Data Agent panel breaks down complex analysis tasks into structured steps and processes them.^[AWS]

The Agent understands the current notebook state, data catalog, and business metadata, then generates context-appropriate code. Rather than spitting out code fragments, it establishes a complete analysis plan.^[AWS]

What Lies Ahead?

According to a Deloitte survey, 92% of healthcare executives are investing in or experimenting with generative AI.^[AWS] Demand for medical AI analysis tools will continue to grow.

Agentic AI like SageMaker Data Agent could positively impact drug development and treatment pattern discovery by accelerating medical research. However, one concern is data quality. No matter how fast AI is, garbage in means garbage out.

Frequently Asked Questions (FAQ)

Q: How much does SageMaker Data Agent cost?

A: SageMaker Unified Studio itself is free. However, actual computing resources (EMR, Athena, Redshift, etc.) are charged based on usage. Notebooks have a free tier of 250 hours for the first two months, so you can test it lightly.

Q: What data sources are supported?

A: It connects to AWS Glue Data Catalog, Amazon S3, Amazon Redshift, and various data sources. If you have existing AWS data infrastructure, you can connect immediately. It is also compatible with medical data standards FHIR and OMOP CDM.

Q: In which regions is it available?

A: It is available in all AWS regions where SageMaker Unified Studio is supported. It is best to check the official AWS documentation for whether the Seoul region is supported.

If you found this article useful, please subscribe to AI Digester.

References

Agentic AI for healthcare data analysis with Amazon SageMaker Data Agent – AWS (2026-02-03)
Introducing Amazon SageMaker Data Agent – AWS (2025-11-21)
Amazon SageMaker Unified Studio – AWS

I Shredded Millions of Books to Build Claude: The Truth Behind Anthropic Project Panama

February 5, 2026 by aidigester

$1.5 Billion Settlement, Millions Destroyed: Key Takeaways

Anthropic purchased, disassembled, scanned, and destroyed millions of books for Claude training
Internal document: “Project Panama is our effort to destructively scan books from around the world”
$1.5 billion settlement, approximately $3,000 per book paid to authors

What Happened?

Over 4,000 pages of court documents were released, revealing Anthropic’s secret project. The codename was “Project Panama.” Internal planning documents explicitly state: “Project Panama is our effort to destructively scan books from around the world.” They bulk-purchased tens of thousands of books from used bookstores like Better World Books and World of Books. They cleanly cut off the spines using “hydraulic cutters.” They scanned the pages with high-speed, high-quality scanners. And recycling companies collected the remaining debris.^[Techmeme]

This project was led by Tom Turvey. He’s a former Google executive who created the Google Books project 20 years ago. Over about a year, Anthropic invested millions of dollars to acquire and scan millions of books.^[Futurism]

Why Is This Important?

Honestly, this reveals the reality of how AI training data is acquired.

Why did Anthropic choose this approach? First, to avoid the risks of illegal downloads. Second, buying used books and disposing of them as desired was likely legal under the “first sale doctrine.” The judge actually recognized this scanning method itself as fair use.^[CNBC]

However, there was a problem. Before Project Panama, Anthropic had freely downloaded over 7 million books from illegal sites like Library Genesis and Pirate Library Mirror. The judge ruled that this portion could constitute copyright infringement.^[NPR]

Personally, I think this is the core issue. The problem isn’t scanning “legitimately” purchased books to destroy them—it’s that they first illegally downloaded them. Anthropic itself was aware of this. Internal documents explicitly state: “We don’t want this work to be known.” Will there be consequences?

The $1.5 billion settlement is the largest in AI copyright dispute history. Approximately $3,000 per book goes to authors for about 500,000 books.^[PBS]

AI has set other precedents. The impact on companies is significant. OpenAI, Google, and Meta also face similar lawsuits. The standard has become clear: “Buying books and scanning them is fine, but illegal downloads are not permitted.”

Anthropic is already embroiled in a music copyright lawsuit. A separate lawsuit was filed in January, with music publishers claiming that Claude 4.5 was trained to “memorize” copyrighted works. Watchdog]

Frequently Asked Questions

Q: How many books were actually usable in Project Panama? Were they destroyed?

A: According to court documents, up to 2 million books were targeted for “destructive scanning.” Anthropic purchased tens of thousands of books from used bookstores like Better World Books and World of Books, and it’s estimated they processed millions of books over about a year, investing millions of dollars.

Q: How much will authors receive?

A: The $1.5 billion settlement applies to approximately 500,000 books. That’s about $3,000 per book. Authors of illegally downloaded books are eligible to file claims, and once the settlement is approved by the court, they can file individually. However, if not all authors file claims, the actual amount received may increase.

Q: Is it legal to buy books and scan them?

A: The judge recognized this method as fair use. This is because, under the “first sale doctrine,” purchased books can be disposed of as desired. However, Anthropic’s problem was that they downloaded books from illegal sites before Project Panama. Scanning legally purchased books is currently legal.

If you found this article useful, subscribe to AI Digester.

References

Anthropic ‘destructively’ scanned millions of books to build Claude – Washington Post (2026-01-27)
Anthropic knew the public would find it ‘distasteful’ – Futurism (2026-01-28)
Anthropic to pay authors $1.5 billion settlement – NPR (2025-09-05)
Millions of books had to die for Claude to live – The Verge (2026-02-03)

One Year of DeepSeek: 113,000 Qwen Derivatives, 4x More Than Llama

February 5, 2026 by aidigester

One Year of Deep Chic Moment: 3 Changes Proven by Numbers

Over 113,000 Qwen derivative models — 4x more than Meta Llama (27,000)
DeepSeek ranks #1 in Hugging Face followers, Qwen at #4
Chinese AI organizations shift direction: “Open source is strategy”

What Happened?

Hugging Face released their ‘Deep Chic Moment’ one-year analysis report.^{[Hugging Face]} This is the final part of a three-part series summarizing data on how China’s open source AI ecosystem has grown since DeepSeek’s emergence in January 2025.

Let’s start with the key metrics. Qwen (Alibaba) derivative models exceeded 113,000 as of mid-2025. Including repositories tagged with Qwen, the number surpasses 200,000.^{[Hugging Face]} This is an overwhelming figure compared to Meta’s Llama (27,000) or DeepSeek (6,000).

Why Is It Important?

Frankly speaking, just a year ago, many people regarded Chinese AI as ‘copycat.’. But now it’s different.

ByteDance, Deepseek, Tencent, and Qwen rank among the top in Hugging Face’s popular papers rankings. In terms of follower count, DeepSeek is #1 and Qwen is #4. Looking at Alibaba as a whole, the number of derivative models matches Google and Meta combined.^{[Hugging Face]}

What I personally find notable is Alibaba’s strategy. Qwen is structured as a ‘family,’ not a single flagship model. It supports various sizes, tasks, and modalities. Simply put, it means: “Use our models as general-purpose AI infrastructure.”

What Will Happen Next?

Hugging Face analyzed that “open source is a short-term dominance strategy for Chinese AI organizations.” The interpretation is that they aim for large-scale integration and deployment by sharing not only models but also papers and deployment infrastructure.

Within just one year, the numbers confirmed that the DeepSeek moment was not a one-time event. The center of gravity in the global AI open source ecosystem is shifting.

Frequently Asked Questions (FAQ)

Q: Are there more Qwen derivatives than Llama? Why?

A: Alibaba released Qwen in various sizes and modalities, expanding its application range. Chinese developers frequently use it for local deployment. The strategy of continuously updating the model range with Hugging Face has also been effective.

Q: Is DeepSeek still important?

A: Yes. DeepSeek has the most followers on Hugging Face. However, it trails Qwen in derivative model count. DeepSeek has strengths in papers and research contributions, while Qwen focuses on ecosystem expansion.

Q: What does this mean for developers?

A: Qwen-based models are strengthening multilingual support. Because it’s open source, local deployment and fine-tuning are free. It’s become a great environment to experiment without cost burden. However, license terms vary by model, so check before use.

If this article was useful, subscribe to AI Digester.

References

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+ – Hugging Face (2026-02-03)

OpenAI Reveals Sora Feed Philosophy: “Doomscrolling Is Not Allowed”

February 5, 2026 by aidigester

OpenAI, Sora feed philosophy revealed: “We do not allow doomscrolling”

Creation first, consumption minimization is the key principle
A new type of recommendation system that can be adjusted with natural language
Safety measures from the creation stage, opposite strategy to TikTok

What happened?

OpenAI officially announced the design philosophy behind Sora’s recommendation feed, their AI video creation app.^[OpenAI] The core message is clear: “This is a platform for creation, not doomscrolling.”

While TikTok has faced controversy for optimizing watch time, OpenAI chose the opposite direction. Instead of maximizing feed dwell time, they prioritize showing content most likely to inspire users to create their own videos.^[TechCrunch]

Why is it important?

Honestly, this is quite an important experiment in social media history. Existing social platforms maximize dwell time to generate ad revenue. The longer users stay, the more money they make. This has resulted in addictive algorithms and mental health issues.

OpenAI already generates revenue through subscription models (ChatGPT Plus). Since they don’t rely on ads, they don’t need to “keep users hooked.” Simply put, because the business model is different, the feed design can be different too.

Personally, I’m curious whether this will actually work. Can a feed that “encourages creation” really keep users engaged? Or will it eventually revert to dwell time optimization?

4 Principles of Sora Feed

Creative Optimization: Induces participation, not consumption. The goal is active creation, not passive scrolling.^{[Digital Watch]}
User control: The algorithm can be adjusted with natural language. Commands like “Show me only comedy today” are possible.
Connection priority: Content from followers and people you know is shown before viral global content.
Safety-freedom balance: Since all content is generated within Sora, harmful content is blocked at the creation stage.

How is it different technically?

OpenAI differs from existing LLMs. Using this approach, a new type of recommendation algorithm was developed. The key differentiator is “natural language instructions.” Users can explain to the algorithm directly in words what type of content they want.^[TechCrunch]

Sora uses activity (likes, comments, remixes), IP-based location, ChatGPT usage history (can be turned off), and creator follower count as personalization signals. However, safety signals are also included to suppress harmful content exposure.

What will happen in the future?

The Sora app launched in just 48 hours. It reached #1 on the App Store. 56,000 downloads on the first day, tripled on the second day.^[TechCrunch] Initial response was enthusiastic.

But the question is sustainability. As OpenAI acknowledges, this feed is a “living system.” It will continue to change based on user feedback. What happens when the creation philosophy conflicts with actual user behavior? We’ll have to watch.

Frequently Asked Questions (FAQ)

Q: How is Sora Feed different from TikTok?

A: TikTok’s goal is to optimize watch time to retain users. Sora does the opposite, showing content most likely to inspire users to create their own videos first. It’s designed to focus on creation rather than consumption.

Q: What does it mean to adjust the algorithm with natural language?

A: Existing apps only recommend based on behavioral data like likes and watch time. Sora allows users to input text instructions like “Show me only sci-fi videos today” and the algorithm adjusts accordingly.

Q: Are there parental protection features?

A: Yes. Using ChatGPT parental control features, you can turn off feed personalization or limit continuous scrolling. Teen accounts have a default daily limit on videos they can create, and the Cameo feature (videos featuring other people) also has stricter permissions.

If you found this article useful, subscribe to AI Digester.

Reference Resources

The Sora feed philosophy – OpenAI (2026-02-03)
How OpenAI designs Sora recommendation feed – Digital Watch Observatory (2026-02-03)
OpenAI is launching the Sora app – TechCrunch (2025-09-30)

Text→Image AI Training: FID Reduced by 30% Through This Method

February 5, 2026 by aidigester

Core Line 3: 200K step secret, Muon optimizer, token routing

REPA sort is only an early accelerator and should be removed after 200K steps
Muon optimizer alone achieves FID 18.2 → 15.55 (15% improvement)
At 1024×1024 high resolution, TREAD token routing reduces FID to 14.10

What happened?

The Photoroom team released an optimization guide for their text-to-image generation model PRX Part 2. ^{[Hugging Face]} While Part 1 covered architecture, this time they shared concrete ablation results on what to do during actual training.

Honestly, most technical documents of this kind end with “our model is the best,” but this is different. They also disclosed failed experiments and showed trade-offs of each technique with numbers.

Why is it important?

The cost of training a text-image model from scratch is enormous. A single wrong setting can waste thousands of GPU hours. The data released by Photoroom reduces such trial and error.

Personally, the most notable finding is about REPA (Representation Alignment). Using REPA-DINOv3 drops FID from 18.2 to 14.64. But there is a problem. Throughput decreases by 13% and training actually degrades after 200K steps. Simply put, it is only an early booster.

Another bug in BF16 weight storage. If you unknowingly save in BF16 instead of FP32, FID spikes from 18.2 to 21.87. That is an increase of 3.67. Surprisingly, many teams fall into this trap.

Practical Guide: Strategies by Resolution

Technique	256×256 FID	1024×1024 FID	Throughput
Baseline	18.20	–	3.95 b/s
REPA-E-VAE	12.08	–	3.39 b/s
TREAD	21.61 ↑	14.10 ↓	1.64 b/s
Muon Optimizer	15.55	–	–

At 256×256, TREAD actually degrades quality. But at 1024×1024, completely different results are obtained. The higher the resolution, the greater the token routing effect.

What will happen in the future?

Photoroom will provide the complete training code in Part 3. They plan to release it and conduct a 24-hour “speed run.” The goal is to show how fast a good model can be built.

Personally, I think this release will have a significant impact on the open source image generation model ecosystem. Since Stable Diffusion, this is the first time training know-how has been disclosed in such detail.

Frequently Asked Questions (FAQ)

Q: When should REPA be removed?

A: After about 200K steps. It accelerates learning at first, but actually hinders convergence after that. This is clearly shown in Photoroom experiments. Missing the timing will degrade the quality of the final model.

Q: Should I use synthetic data or real images?

A: Use both. Use synthetic images at first to learn global structure, then use real images in later stages to capture high-frequency details. Using only compositing gives good FID but does not look photorealistic.

Q: How much better is Muon optimizer than AdamW?

A: About 15% improvement in FID. It drops from 18.2 to 15.55. Since computational cost is similar, there is no reason not to use it. However, hyperparameter tuning is slightly tricky.

If you found this article useful, please subscribe to AI Digester.

References

Training Design for Text-to-Image Models: Lessons from Ablations – Hugging Face (2026-02-03)

pi-mono: Claude Code Alternative AI Coding Agent 5.9k stars

February 5, 2026 by aidigester

pi-mono: Create Your Own AI Coding Agent in Your Terminal

GitHub Stars: 5.9k
Language: TypeScript 96.5%
License: MIT

Why This Project Is Popping Up

A developer felt Claude Code had become too complex. Mario Zechner experimented with LLM coding tools for 3 years and eventually decided to build his own tool.^{[Mario Zechner]}

pi-mono is an AI agent toolkit built with the philosophy “don’t build it if you don’t need it.” It starts with a 1000-token system prompt and 4 core tools (read, write, edit, bash). Very lightweight compared to Claude Code’s thousands of token prompts. What’s in it?

Integrated LLM API: Use 15+ providers including OpenAI, Anthropic, Google, Azure, Mistral, Groq in one interface
Coding Agent CLI: Write, test, and debug code interactively in the terminal
Session Management: Pause and resume work, branch like git
Slack bot: Delegate Slack messages to the coding agent
vLLM pod management: Deploy and manage your own models on GPU pods
TUI/Web UI library: Build your own AI chat interface

Quick Start

# Install
npm install @mariozechner/pi-coding-agent

# run
npx pi

# or build from source
git clone https://github.com/badlogic/pi-mono
cd pi-mono
npm install && npm run build
./pi-test.sh

Where Can I Use It?

If Claude Code’s $200/month is too expensive and you prefer working in the terminal, pi could be an alternative. You only pay for API costs.

If you want to use self-hosted LLMs but existing tools don’t support them well, pi is the answer. It even has built-in vLLM pod management.

Personally, I think “transparency” is the biggest advantage. Claude Code runs invisible sub-agents internally to perform tasks. pi lets you directly see all model interactions.

Things to Note

Minimalism is the philosophy. MCP (Model Context Protocol) support is intentionally omitted
Full access called “YOLO mode” is the default. Be careful as permission checks are looser than Claude Code
Documentation is still lacking. Read the AGENTS.md file carefully

Similar Projects

Aider: Also an open source terminal coding tool. Similar in being model-agnostic, but pi covers a broader scope (UI library, pod management, etc.). ^[AIMultiple]

Claude Code: Has more features but requires a monthly subscription and has limitations on customization. pi allows freely adding features through TypeScript extensions.^[Northflank]

Cursor: AI integrated into an IDE. If you prefer GUI over terminal, Cursor is better.

Frequently Asked Questions (FAQ)

Q: Can I use it for free?

A: pi is completely free under the MIT license. However, if you use external LLM APIs like OpenAI or Anthropic, those costs apply. You can use it without API costs by running Ollama or self-hosted vLLM locally.

Q: Is the performance good enough to replace Claude Code?

A: In Terminal-Bench 2.0 benchmarks, pi with Claude Opus 4.5 showed competitive results with Codex, Cursor, and Windsurf. This proves the minimalist approach doesn’t compromise performance.

Q: Does it support languages other than English?

A: The UI is in English, but if the connected LLM supports other languages, you can communicate and code in that language. You can write code with prompts in any language by connecting Claude or GPT-4.

If you found this article useful, please subscribe to AI Digester.

References

GitHub repository
What I learned building an opinionated and minimal coding agent – Mario Zechner (2025-11-30)
Claude Code vs Cursor Comparison – Northflank (2026-01-15)
Agentic CLI Tools Compared – AIMultiple (2026-01-20)

Reporter Infiltrates AI-Only SNS: What Was the Result?

What Happened?

Why Is It Important?

What’s the Future? Will It Work?

Frequently Asked Questions

References

MS Building AI Content Licensing Marketplace: 3 Key Points

What Happened?

Why Is It Important?

What Will Happen in the Future?

Frequently Asked Questions (FAQ)

References

235B Parameter Model Revolutionizes UI Automation

What Happened?

Why Is It Important?

What Will Happen in the Future?

Frequently Asked Questions (FAQ)

References

Free AI Primary Care Doctor Raises $35 Million

What Happened?

Why Is It Important?

What Will Happen in the Future?

Frequently Asked Questions

References

Medical Data Analysis, From Weeks to Days

What Happened?

Why Is This Important?

How Does It Work?

What Lies Ahead?

Frequently Asked Questions (FAQ)

References

$1.5 Billion Settlement, Millions Destroyed: Key Takeaways

What Happened?

Why Is This Important?

Frequently Asked Questions

References

One Year of Deep Chic Moment: 3 Changes Proven by Numbers

What Happened?

Why Is It Important?

What Will Happen Next?

Frequently Asked Questions (FAQ)

References

OpenAI, Sora feed philosophy revealed: “We do not allow doomscrolling”

What happened?

Why is it important?

4 Principles of Sora Feed

How is it different technically?

What will happen in the future?

Frequently Asked Questions (FAQ)

Reference Resources

Core Line 3: 200K step secret, Muon optimizer, token routing

What happened?

Why is it important?

Practical Guide: Strategies by Resolution

What will happen in the future?

Frequently Asked Questions (FAQ)

References

pi-mono: Create Your Own AI Coding Agent in Your Terminal

Why This Project Is Popping Up

Quick Start

Where Can I Use It?

Things to Note

Similar Projects

Frequently Asked Questions (FAQ)

References